# Reinforcement learning fine-tuning

Unireason Qwen3 14B RL GGUF
Apache-2.0
A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.
Large Language Model Transformers English
U
mradermacher
272
1
Vigorl 7b Spatial
ViGoRL is a vision-language model fine-tuned through reinforcement learning, used to clearly associate text reasoning steps with visual coordinates to achieve precise visual reasoning and positioning.
Text-to-Image Transformers
V
gsarch
319
1
Deepseek R1 Distill Qwen 14B GRPO Taiwan Spirit
This is a fine-tuned version based on the Qwen-14B model, trained using the GRPO method, suitable for text generation tasks.
Large Language Model Transformers
D
kartd
111
1
Codev R1 Qwen 7B
CodeV-R1-Qwen-7B is a model obtained through reinforcement learning fine-tuning based on the CodeV-R1 framework and the Qwen/Qwen2.5-Coder-7B-Instruct. It focuses on Verilog-related tasks and can effectively solve the problem of automatically generating hardware description languages in electronic design automation.
Large Language Model Transformers
C
zhuyaoyu
138
1
Xgen Small 9B Instruct R
xGen-small is an enterprise-grade compact language model that achieves long-context performance with predictable low costs through domain-focused data curation, scalable pre-training, length extension, and reinforcement learning fine-tuning.
Large Language Model Transformers English
X
Salesforce
97
4
Phi 4 Reasoning Plus GGUF
MIT
Phi-4-reasoning-plus is a large language model developed by Microsoft with enhanced reasoning capabilities, specifically optimized for complex mathematical problems and multi-step reasoning tasks.
Large Language Model Supports Multiple Languages
P
lmstudio-community
5,205
4
Openhands Lm 7b V0.1 GGUF
MIT
OpenHands LM is an open-source coding model built on Qwen Coder 2.5 Instruct 32B, which performs excellently in software engineering tasks through special fine-tuning.
Large Language Model English
O
Mungert
1,131
2
Ablation 141 A128.dpo.armorm.rp Shisa V2 Llama 3.1 8b
Language model fine-tuned using DPO method, suitable for text generation tasks
Large Language Model Transformers
A
shisa-ai
38
2
Ice0.101 20.03 RP GRPO 1
Apache-2.0
A Mist model optimized with Unsloth lazy-free framework and Huggingface TRL training library, achieving 2x training efficiency
Large Language Model Transformers English
I
icefog72
55
2
Llama 3.1 Tulu 3.1 8B
Tülu 3 is a leading family of instruction-following models, offering fully open-source data, code, and training methodologies as a comprehensive guide to modern technology. Version 3.1 features improvements in the reinforcement learning phase, delivering enhanced overall performance.
Large Language Model Transformers English
L
allenai
3,643
33
Ppo Tldr
A fine-tuned version based on the EleutherAI_pythia-1b-deduped model for generating concise summaries
Large Language Model Transformers
P
vwxyzjn
15
1
Llama 3 NeuralPaca 8b
An optimized model based on Meta LLAMA-3-8B, trained using lazy-free optimization techniques and the Huggingface TRL library, achieving 2x speed improvement
Large Language Model Transformers English
L
NeuralNovel
21
7
Blip Image Captioning Base Mocha
MIT
Official checkpoint of BLIP base model fine-tuned on MS-COCO dataset using MOCHA reinforcement learning framework to mitigate open-vocabulary description hallucination
Image-to-Text Transformers
B
moranyanuka
88
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase